منابع مشابه
A generalized Markov decision process
— In this paper we present a generalized Markov décision process that subsumes the traditional discounted, infinité horizon, finite state and action Markov décision process, VeinotCs discountéd décision processes, and Koehler's generalization of these two problem classes. Résumé. — Nous présentons dans cet article un processus de Markov généralisé qui englobe le processus de décision markovien ...
متن کاملQuantile Markov Decision Process
In this paper, we consider the problem of optimizing the quantiles of the cumulative rewards of Markov Decision Processes (MDP), to which we refers as Quantile Markov Decision Processes (QMDP). Traditionally, the goal of a Markov Decision Process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly to be infinite). In many applications, however, a decision maker may ...
متن کاملOptimizing Red Blood Cells Consumption Using Markov Decision Process
In healthcare systems, one of the important actions is related to perishable products such as red blood cells (RBCs) units that its consumption management in different periods can contribute greatly to the optimality of the system. In this paper, main goal is to enhance the ability of medical community to organize the RBCs units’ consumption in way to deliver the unit order timely with a focus ...
متن کاملA multicriteria competitive Markov decision process
In this paper, we deal with a multicriteria competitive Markov decision process. In the decision process there are two decision makers with a competitive behaviour, so they are usually called players. Their rewards are coupled because depend on the actions chosen by both players in each state of the process. We propose as solution of this game the set of Pareto-optimal security strategies for a...
متن کاملExperts in a Markov Decision Process
We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well can an agent do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: RAIRO - Operations Research
سال: 1980
ISSN: 0399-0559,1290-3868
DOI: 10.1051/ro/1980140403491